NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos

Majumder, S; Nagarajan, T; Al-Halah, Z; Grauman, K (April 2025, https://doi.org/10.48550/arXiv.2412.18386)

We introduce SWITCH-A-VIEW, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video. The key insight of our approach is how to train such a model from unlabeled -- but human-edited -- video samples. We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint (egocentric or exocentric), and then discovers the patterns between the visual and spoken content in a how-to video on the one hand and its view-switch moments on the other hand. Armed with this predictor, our model can be applied to new multi-view video settings for orchestrating which viewpoint should be displayed when, even when such settings come with limited labels. We demonstrate our idea on a variety of real-world videos from HowTo100M and Ego-Exo4D, and rigorously validate its advantages.
more » « less
Full Text Available
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos

Majumder, S; Nagarajan, T; Al-Halah, Z; Pradhan, R; Grauman, K (April 2025, https://doi.org/10.48550/arXiv.2411.08753)

Given a multi-view video, which viewpoint is most informative for a human observer? Existing methods rely on heuristics or expensive "best-view" supervision to answer this question, limiting their applicability. We propose a weakly supervised approach that leverages language accompanying an instructional multi-view video as a means to recover its most informative viewpoint(s). Our key hypothesis is that the more accurately an individual view can predict a view-agnostic text summary, the more informative it is. To put this into action, we propose LangView, a framework that uses the relative accuracy of view-dependent caption predictions as a proxy for best view pseudo-labels. Then, those pseudo-labels are used to train a view selector, together with an auxiliary camera pose predictor that enhances view-sensitivity. During inference, our model takes as input only a multi-view video--no language or camera poses--and returns the best viewpoint to watch at each timestep. On two challenging datasets comprised of diverse multi-camera setups and how-to activities, our model consistently outperforms state-of-the-art baselines, both with quantitative metrics and human evaluation.
more » « less
Full Text Available
Control of ${}^{164}{Dy}$ Bose-Einstein condensate phases and dynamics with dipolar anisotropy

https://doi.org/10.1103/PhysRevResearch.4.043124

Halder, S.; Mukherjee, K.; Mistakidis, S. I.; Das, S.; Kevrekidis, P. G.; Panigrahi, P. K.; Majumder, S.; Sadeghpour, H. R. (November 2022, Physical Review Research)

Full Text Available
Spontaneous Formation of Star-Shaped Surface Patterns in a Driven Bose-Einstein Condensate

https://doi.org/10.1103/PhysRevLett.127.113001

Kwon, K.; Mukherjee, K.; Huh, S. J.; Kim, K.; Mistakidis, S. I.; Maity, D. K.; Kevrekidis, P. G.; Majumder, S.; Schmelcher, P.; Choi, J.-y. (September 2021, Physical Review Letters)

Full Text Available
Parametrically excited star-shaped patterns at the interface of binary Bose-Einstein condensates

https://doi.org/10.1103/PhysRevA.102.033320

Maity, D. K.; Mukherjee, K.; Mistakidis, S. I.; Das, S.; Kevrekidis, P. G.; Majumder, S.; Schmelcher, P. (September 2020, Physical Review A)
null (Ed.)
Full Text Available

Search for: All records